Learning Objectives:
- Get started with
"dplyr"- Get to know the basic dplyr verbs:
slice(),filter(),select()mutate()arrange()summarise()group_by()- Get started with
"ggplot2"- Produce basic plots with
ggplot()
Last week you started to manipulate data tables (under the class of "data.frame" objects) using bracket notation, dat[ , ], and the dollar operator, dat$name, in order to select specific rows, columns, or cells. In addition, you have been creating charts with functions like plot(), boxplot(), and barplot(), which are part of the "graphics" package.
In this lab, you will start learning about other approaches to manipulate tables and create statistical charts. We are going to use the functionality of the package "dplyr" to work with tabular data in a more consistent way. This is a fairly recent package introduced a couple of years ago, but it is based on more than a decade of research and work lead by Hadley Wickham.
Likewise, to create graphics in a more consistent and visually pleasing way, we are going to use the package "ggplot2", also originally authored by Hadley Wickham, and developed as part of his PhD more than a decade ago.
Use the first hour of the lab to get as far as possible with the material associated to "dplyr". Then use the second hour of the lab to work on graphics with "ggplot2".
While you follow this lab, you may want to open these cheat sheets:
We want you to keep practicing with the command line (e.g. Mac Terminal, Gitbash). Follow the steps listed below to create the necessary subdirectories like those depicted in this scheme:
lab05/
README.md
data/
nba2017-players.csv
report/
lab05.Rmd
lab05.html
images/
... # all the plot files
mkdir to create a directory lab05 for the lab materialscd to change directory to (i.e. move inside) lab05data, report, imagesls to list the contents of lab05 and confirm that you have all the subdirectories.touch to create an empty README.md text fileREADME.md file, and then add a brief description of today’s lab, using markdown syntax.data/ folder.Download the data file with the command curl, and the -O option (letter O)
curl -O https://raw.githubusercontent.com/ucb-stat133/stat133-spring-2018/master/data/nba2017-players.csvls to confirm that the csv file is in data/wc to count the lines of the csv fileheadTake a peek at the last 5 rows of the csv file with tail
| Last login: Wed Feb 14 14:58:35 on ttys001 airbears2-10-142-129-9:~ XuewenLi$ mkdir lab05 airbears2-10-142-129-9:~ XuewenLi$ cd lab05 airbears2-10-142-129-9:lab05 XuewenLi$ mkdir data report images airbears2-10-142-129-9:lab05 XuewenLi$ ls lab05 ls: lab05: No such file or directory airbears2-10-142-129-9:lab05 XuewenLi$ ls data images report airbears2-10-142-129-9:lab05 XuewenLi$ touch usage: touch [-A [-][[hh]mm]SS] [-acfhm] [-r file] [-t [[CC]YY]MMDDhhmm[.SS]] file … airbears2-10-142-129-9:lab05 XuewenLi$ touch READ.md airbears2-10-142-129-9:lab05 XuewenLi$ cd data airbears2-10-142-129-9:data XuewenLi$ curl -O https://raw.githubusercontent.com/ucb-stat133/stat133-spring-2018/master/data/nba2017-players.csv % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 39752 100 39752 0 0 111k 0 –:–:– –:–:– –:–:– 111k airbears2-10-142-129-9:data XuewenLi$ wc nba2017-players.csv 442 1632 39752 nba2017-players.csv airbears2-10-142-129-9:data XuewenLi$ head -1 nba2017-players.csv “player”,“team”,“position”,“height”,“weight”,“age”,“experience”,“college”,“salary”,“games”,“minutes”,“points”,“points3”,“points2”,“points1” airbears2-10-142-129-9:data XuewenLi$ tail -5 nba2017-players.csv “Marquese Chriss”,“PHO”,“PF”,82,233,19,0,“University of Washington”,2941440,82,1743,753,72,212,113 “Ronnie Price”,“PHO”,“PG”,74,190,33,11,“Utah Valley State College”,282595,14,134,14,3,1,3 “T.J. Warren”,“PHO”,“SF”,80,230,23,2,“North Carolina State University”,2128920,66,2048,951,26,377,119 “Tyler Ulis”,“PHO”,“PG”,70,150,21,0,“University of Kentucky”,918369,61,1123,444,21,163,55 “Tyson Chandler”,“PHO”,“C”,85,240,34,15,“”,12415000,47,1298,397,0,153,91 airbears2-10-142-129-9:data XuewenLi$ |
| ### Installing packages |
I’m assuming that you already installed the packages "dplyr" and "ggplot2". If that’s not the case then run on the console the command below (do NOT include this command in your Rmd): |
Remember that you only need to install a package once! After a package has been installed in your machine, there is no need to call install.packages() again on the same package. What you should always invoke in order to use the functions in a package is the library() function: |
r # (include these commands in your Rmd file) # don't forget to load the packages library(dplyr) library(ggplot2) library(readr) |
About loading packages: Another rule to keep in mind is to always load any required packages at the very top of your script files (.R or .Rmd or .Rnw files). Avoid calling the library() function in the middle of a script. Instead, load all the packages before anything else. |
| ### Path for Images |
r knitr::opts_chunk$set(echo = T, fig.path="../images/") |
If you don’t specify fig.path, "knitr" will create a default directory to store all the plots produced when knitting an Rmd file. This time, however, we want to have more control over where things are placed. Because you already have a folder images/ as part of the filestructure, this is where we want "knitr" to save all the generated graphics. |
Notice the use of a relative path fig.path = '../images/'. This is because your Rmd file should be inside the folder report/, but the folder images/ is outside report/ (i.e. in the same parent directory of report/). |
dat <- read.csv('../data/nba2017-players.csv', stringsAsFactors = FALSE )
nba_data <- '../data/nba2017-players.csv'
The data file for this lab is the same you used last week: nba2017-players.csv.
To import the data in R you can use the base function read.csv(), or you can also use read_csv() from the package "readr":
# with "base" read.csv()
dat <- read.csv(nba_data, stringsAsFactors = FALSE)
# with "readr" read_csv()
dat <- read_csv(nba_data)
"dplyr" verbsTo make the learning process of "dplyr" gentler, Hadley Wickham proposes beginning with a set of five basic verbs or operations for data frames (each verb corresponds to a function in "dplyr"):
I’ve slightly modified Hadley’s list of verbs:
filter(), slice(), and select(): subsetting and selecting rows and columnsmutate(): add new variablesarrange(): reorder rowssummarise(): reduce variables to valuesgroup_by(): grouped (aggregate) operationsslice() allows you to select rows by position:
# first three rows
three_rows <- slice(dat, 1:3)
## Warning: package 'bindrcpp' was built under R version 3.3.2
three_rows
## # A tibble: 3 x 15
## player team position height weight age experience
## <chr> <chr> <chr> <int> <int> <int> <int>
## 1 Al Horford BOS C 82 245 30 9
## 2 Amir Johnson BOS PF 81 240 29 11
## 3 Avery Bradley BOS SG 74 180 26 6
## # ... with 8 more variables: college <chr>, salary <dbl>, games <int>,
## # minutes <int>, points <int>, points3 <int>, points2 <int>,
## # points1 <int>
filter() allows you to select rows by condition:
# subset rows given a condition
# (height greater than 85 inches)
gt_85 <- filter(dat, height > 85)
gt_85
## player team position height weight age experience
## 1 Edy Tavares CLE C 87 260 24 1
## 2 Boban Marjanovic DET C 87 290 28 1
## 3 Kristaps Porzingis NYK PF 87 240 21 1
## 4 Roy Hibbert DEN C 86 270 30 8
## 5 Alexis Ajinca NOP C 86 248 28 6
## college salary games minutes points points3 points2
## 1 5145 1 24 6 0 3
## 2 7000000 35 293 191 0 72
## 3 4317720 66 2164 1196 112 331
## 4 Georgetown University 5000000 6 11 4 0 2
## 5 4600000 39 584 207 0 89
## points1
## 1 0
## 2 47
## 3 198
## 4 0
## 5 29
select() allows you to select columns by name:
# columns by name
player_height <- select(dat, player, height)
slice() to subset the data by selecting the first 5 rows.slice(dat,1:5)
## # A tibble: 5 x 15
## player team position height weight age experience
## <chr> <chr> <chr> <int> <int> <int> <int>
## 1 Al Horford BOS C 82 245 30 9
## 2 Amir Johnson BOS PF 81 240 29 11
## 3 Avery Bradley BOS SG 74 180 26 6
## 4 Demetrius Jackson BOS PG 73 201 22 0
## 5 Gerald Green BOS SF 79 205 31 9
## # ... with 8 more variables: college <chr>, salary <dbl>, games <int>,
## # minutes <int>, points <int>, points3 <int>, points2 <int>,
## # points1 <int>
slice() to subset the data by selecting rows 10, 15, 20, …, 50.slice(dat,10:50)
## # A tibble: 41 x 15
## player team position height weight age experience
## <chr> <chr> <chr> <int> <int> <int> <int>
## 1 Jonas Jerebko BOS PF 82 231 29 6
## 2 Jordan Mickey BOS PF 80 235 22 1
## 3 Kelly Olynyk BOS C 84 238 25 3
## 4 Marcus Smart BOS SG 76 220 22 2
## 5 Terry Rozier BOS PG 74 190 22 1
## 6 Tyler Zeller BOS C 84 253 27 4
## 7 Channing Frye CLE C 83 255 33 10
## 8 Dahntay Jones CLE SF 78 225 36 12
## 9 Deron Williams CLE PG 75 200 32 11
## 10 Derrick Williams CLE PF 80 240 25 5
## # ... with 31 more rows, and 8 more variables: college <chr>,
## # salary <dbl>, games <int>, minutes <int>, points <int>, points3 <int>,
## # points2 <int>, points1 <int>
slice() to subset the data by selecting the last 5 rows.n <- length(dat)
slice(dat, (n-4):n)
## # A tibble: 5 x 15
## player team position height weight age experience
## <chr> <chr> <chr> <int> <int> <int> <int>
## 1 Jordan Mickey BOS PF 80 235 22 1
## 2 Kelly Olynyk BOS C 84 238 25 3
## 3 Marcus Smart BOS SG 76 220 22 2
## 4 Terry Rozier BOS PG 74 190 22 1
## 5 Tyler Zeller BOS C 84 253 27 4
## # ... with 8 more variables: college <chr>, salary <dbl>, games <int>,
## # minutes <int>, points <int>, points3 <int>, points2 <int>,
## # points1 <int>
filter() to subset those players with height less than 70 inches tall.filter(dat, height < 70)
## player team position height weight age experience
## 1 Isaiah Thomas BOS PG 69 185 27 5
## 2 Kay Felder CLE PG 69 176 21 0
## college salary games minutes points points3 points2
## 1 University of Washington 6587132 76 2569 2199 245 437
## 2 Oakland University 543471 42 386 166 7 55
## points1
## 1 590
## 2 35
filter() to subset rows of Golden State Warriors (‘GSW’).GSW <- filter(dat, team=="GSW")
GSW
## player team position height weight age experience
## 1 Andre Iguodala GSW SF 78 215 33 12
## 2 Damian Jones GSW C 84 245 21 0
## 3 David West GSW C 81 250 36 13
## 4 Draymond Green GSW PF 79 230 26 4
## 5 Ian Clark GSW SG 75 175 25 3
## 6 James Michael McAdoo GSW PF 81 230 24 2
## 7 JaVale McGee GSW C 84 270 29 8
## 8 Kevin Durant GSW SF 81 240 28 9
## 9 Kevon Looney GSW C 81 220 20 1
## 10 Klay Thompson GSW SG 79 215 26 5
## 11 Matt Barnes GSW SF 79 226 36 13
## 12 Patrick McCaw GSW SG 79 185 21 0
## 13 Shaun Livingston GSW PG 79 192 31 11
## 14 Stephen Curry GSW PG 75 190 28 7
## 15 Zaza Pachulia GSW C 83 270 32 13
## college salary games minutes points
## 1 University of Arizona 11131368 76 1998 574
## 2 Vanderbilt University 1171560 10 85 19
## 3 Xavier University 1551659 68 854 316
## 4 Michigan State University 15330435 76 2471 776
## 5 Belmont University 1015696 77 1137 527
## 6 University of North Carolina 980431 52 457 147
## 7 University of Nevada, Reno 1403611 77 739 472
## 8 University of Texas at Austin 26540100 62 2070 1555
## 9 University of California, Los Angeles 1182840 53 447 135
## 10 Washington State University 16663575 78 2649 1742
## 11 University of California, Los Angeles 383351 20 410 114
## 12 University of Nevada, Las Vegas 543471 71 1074 282
## 13 5782450 76 1345 389
## 14 Davidson College 12112359 79 2638 1999
## 15 2898000 70 1268 426
## points3 points2 points1
## 1 64 155 72
## 2 0 8 3
## 3 3 132 43
## 4 81 191 151
## 5 61 150 44
## 6 2 60 21
## 7 0 208 56
## 8 117 434 336
## 9 2 54 21
## 10 268 376 186
## 11 18 20 20
## 12 41 65 29
## 13 1 172 42
## 14 324 351 325
## 15 0 164 98
filter() to subset rows of GSW centers (‘C’).filter(GSW, position=="C")
## player team position height weight age experience
## 1 Damian Jones GSW C 84 245 21 0
## 2 David West GSW C 81 250 36 13
## 3 JaVale McGee GSW C 84 270 29 8
## 4 Kevon Looney GSW C 81 220 20 1
## 5 Zaza Pachulia GSW C 83 270 32 13
## college salary games minutes points
## 1 Vanderbilt University 1171560 10 85 19
## 2 Xavier University 1551659 68 854 316
## 3 University of Nevada, Reno 1403611 77 739 472
## 4 University of California, Los Angeles 1182840 53 447 135
## 5 2898000 70 1268 426
## points3 points2 points1
## 1 0 8 3
## 2 3 132 43
## 3 0 208 56
## 4 2 54 21
## 5 0 164 98
filter() and then select(), to subset rows of lakers (‘LAL’), and then display their names.LAL <- filter(dat, team =="LAL")
select(LAL, player)
## player
## 1 Brandon Ingram
## 2 Corey Brewer
## 3 D'Angelo Russell
## 4 David Nwaba
## 5 Ivica Zubac
## 6 Jordan Clarkson
## 7 Julius Randle
## 8 Larry Nance Jr.
## 9 Luol Deng
## 10 Metta World Peace
## 11 Nick Young
## 12 Tarik Black
## 13 Thomas Robinson
## 14 Timofey Mozgov
## 15 Tyler Ennis
filter() and then select(), to display the name and salary, of GSW point guardsGSW_point <- filter(GSW, position =="PG")
select(GSW_point, player, salary)
## player salary
## 1 Shaun Livingston 5782450
## 2 Stephen Curry 12112359
experience10 <- filter(dat, experience > 10)
salary10m <- filter(experience10, salary < 10000000)
A <- select(salary10m, player, age, team)
A
## player age team
## 1 Dahntay Jones 36 CLE
## 2 Deron Williams 32 CLE
## 3 James Jones 36 CLE
## 4 Kyle Korver 35 CLE
## 5 Richard Jefferson 36 CLE
## 6 Jose Calderon 35 ATL
## 7 Kris Humphries 31 ATL
## 8 Mike Dunleavy 36 ATL
## 9 Jason Terry 39 MIL
## 10 C.J. Miles 29 IND
## 11 Udonis Haslem 36 MIA
## 12 Beno Udrih 34 DET
## 13 David West 36 GSW
## 14 Matt Barnes 36 GSW
## 15 Shaun Livingston 31 GSW
## 16 Zaza Pachulia 32 GSW
## 17 David Lee 33 SAS
## 18 Lou Williams 30 HOU
## 19 Trevor Ariza 31 HOU
## 20 Brandon Bass 31 LAC
## 21 Paul Pierce 39 LAC
## 22 Raymond Felton 32 LAC
## 23 Boris Diaw 34 UTA
## 24 Nick Collison 36 OKC
## 25 Tony Allen 35 MEM
## 26 Vince Carter 40 MEM
## 27 Jameer Nelson 34 DEN
## 28 Mike Miller 36 DEN
## 29 Devin Harris 33 DAL
## 30 Metta World Peace 37 LAL
## 31 Leandro Barbosa 34 PHO
## 32 Ronnie Price 33 PHO
age20 <- filter(dat, age == 20)
rookie <- select(age20, player,team, height, weight)
slice(rookie,1:5)
## # A tibble: 5 x 4
## player team height weight
## <chr> <chr> <int> <int>
## 1 Jaylen Brown BOS 79 225
## 2 Rashad Vaughn MIL 78 202
## 3 Myles Turner IND 83 243
## 4 Justise Winslow MIA 79 225
## 5 Henry Ellenson DET 83 245
mutate()Another basic verb is mutate() which allows you to add new variables. Let’s create a small data frame for the warriors with three columns: player, height, and weight:
# creating a small data frame step by step
gsw <- filter(dat, team == 'GSW')
gsw <- select(gsw, player, height, weight)
gsw <- slice(gsw, c(4, 8, 10, 14, 15))
gsw
## # A tibble: 5 x 3
## player height weight
## <chr> <int> <int>
## 1 Draymond Green 79 230
## 2 Kevin Durant 81 240
## 3 Klay Thompson 79 215
## 4 Stephen Curry 75 190
## 5 Zaza Pachulia 83 270
Now, let’s use mutate() to (temporarily) add a column with the ratio height / weight:
mutate(gsw, height / weight)
## # A tibble: 5 x 4
## player height weight `height/weight`
## <chr> <int> <int> <dbl>
## 1 Draymond Green 79 230 0.3434783
## 2 Kevin Durant 81 240 0.3375000
## 3 Klay Thompson 79 215 0.3674419
## 4 Stephen Curry 75 190 0.3947368
## 5 Zaza Pachulia 83 270 0.3074074
You can also give a new name, like: ht_wt = height / weight:
mutate(gsw, ht_wt = height / weight)
## # A tibble: 5 x 4
## player height weight ht_wt
## <chr> <int> <int> <dbl>
## 1 Draymond Green 79 230 0.3434783
## 2 Kevin Durant 81 240 0.3375000
## 3 Klay Thompson 79 215 0.3674419
## 4 Stephen Curry 75 190 0.3947368
## 5 Zaza Pachulia 83 270 0.3074074
In order to permanently change the data, you need to assign the changes to an object:
gsw2 <- mutate(gsw, ht_m = height * 0.0254, wt_kg = weight * 0.4536)
gsw2
## # A tibble: 5 x 5
## player height weight ht_m wt_kg
## <chr> <int> <int> <dbl> <dbl>
## 1 Draymond Green 79 230 2.0066 104.328
## 2 Kevin Durant 81 240 2.0574 108.864
## 3 Klay Thompson 79 215 2.0066 97.524
## 4 Stephen Curry 75 190 1.9050 86.184
## 5 Zaza Pachulia 83 270 2.1082 122.472
arrange()The next basic verb of "dplyr" is arrange() which allows you to reorder rows. For example, here’s how to arrange the rows of gsw by height
# order rows by height (increasingly)
arrange(gsw, height)
## # A tibble: 5 x 3
## player height weight
## <chr> <int> <int>
## 1 Stephen Curry 75 190
## 2 Draymond Green 79 230
## 3 Klay Thompson 79 215
## 4 Kevin Durant 81 240
## 5 Zaza Pachulia 83 270
By default arrange() sorts rows in increasing order. To arrange rows in descending order you need to use the auxiliary function desc().
# order rows by height (decreasingly)
arrange(gsw, desc(height))
## # A tibble: 5 x 3
## player height weight
## <chr> <int> <int>
## 1 Zaza Pachulia 83 270
## 2 Kevin Durant 81 240
## 3 Draymond Green 79 230
## 4 Klay Thompson 79 215
## 5 Stephen Curry 75 190
# order rows by height, and then weight
arrange(gsw, height, weight)
## # A tibble: 5 x 3
## player height weight
## <chr> <int> <int>
## 1 Stephen Curry 75 190
## 2 Klay Thompson 79 215
## 3 Draymond Green 79 230
## 4 Kevin Durant 81 240
## 5 Zaza Pachulia 83 270
gsw, add a new variable product with the product of height and weight.mutate(gsw, product = height*weight)
## # A tibble: 5 x 4
## player height weight product
## <chr> <int> <int> <int>
## 1 Draymond Green 79 230 18170
## 2 Kevin Durant 81 240 19440
## 3 Klay Thompson 79 215 16985
## 4 Stephen Curry 75 190 14250
## 5 Zaza Pachulia 83 270 22410
gsw3, by adding columns log_height and log_weight with the log transformations of height and weight.gsw3 <- mutate( gsw,log_height = log(height), log_weight = log(weight))
gsw3
## # A tibble: 5 x 5
## player height weight log_height log_weight
## <chr> <int> <int> <dbl> <dbl>
## 1 Draymond Green 79 230 4.369448 5.438079
## 2 Kevin Durant 81 240 4.394449 5.480639
## 3 Klay Thompson 79 215 4.369448 5.370638
## 4 Stephen Curry 75 190 4.317488 5.247024
## 5 Zaza Pachulia 83 270 4.418841 5.598422
filter() and arrange() those players with height less than 71 inches tall, in increasing order.newheight <- filter(dat, height < 71)
arrange(newheight, height)
## player team position height weight age experience
## 1 Isaiah Thomas BOS PG 69 185 27 5
## 2 Kay Felder CLE PG 69 176 21 0
## 3 Tyler Ulis PHO PG 70 150 21 0
## college salary games minutes points points3 points2
## 1 University of Washington 6587132 76 2569 2199 245 437
## 2 Oakland University 543471 42 386 166 7 55
## 3 University of Kentucky 918369 61 1123 444 21 163
## points1
## 1 590
## 2 35
## 3 55
B <- select(dat, player,team, salary)
C <- arrange(B, desc(salary))
head(C,3)
## player team salary
## 1 LeBron James CLE 30963450
## 2 Al Horford BOS 26540100
## 3 DeMar DeRozan TOR 26540100
head(C,5)
## player team salary
## 1 LeBron James CLE 30963450
## 2 Al Horford BOS 26540100
## 3 DeMar DeRozan TOR 26540100
## 4 Kevin Durant GSW 26540100
## 5 James Harden HOU 26540100
D <- select(dat, player,team, points3)
E <- arrange(D, desc(points3))
head(E,10)
## player team points3
## 1 Stephen Curry GSW 324
## 2 Klay Thompson GSW 268
## 3 James Harden HOU 262
## 4 Eric Gordon HOU 246
## 5 Isaiah Thomas BOS 245
## 6 Kemba Walker CHO 240
## 7 Bradley Beal WAS 223
## 8 Damian Lillard POR 214
## 9 Ryan Anderson HOU 204
## 10 J.J. Redick LAC 201
gsw_mpg of GSW players, that contains variables for player name, experience, and min_per_game (minutes per game), sorted by min_per_game (in descending order)dat1 <- mutate(dat,min_per_game=minutes/games)
gsw_mpg <- select(dat1, player,experience, min_per_game)
arrange(gsw_mpg, desc(min_per_game))
## player experience min_per_game
## 1 LeBron James 13 37.756757
## 2 Kyle Lowry 10 37.400000
## 3 Zach LaVine 2 37.212766
## 4 Andrew Wiggins 2 37.170732
## 5 Jimmy Butler 5 36.960526
## 6 Karl-Anthony Towns 1 36.951220
## 7 James Harden 7 36.382716
## 8 John Wall 6 36.358974
## 9 Anthony Davis 4 36.106667
## 10 Damian Lillard 4 35.920000
## 11 Paul George 6 35.853333
## 12 Giannis Antetokounmpo 3 35.562500
## 13 Harrison Barnes 4 35.481013
## 14 DeMar DeRozan 7 35.405405
## 15 Kyrie Irving 5 35.069444
## 16 Devin Booker 1 35.000000
## 17 C.J. McCollum 3 34.950000
## 18 Bradley Beal 4 34.857143
## 19 Justise Winslow 1 34.722222
## 20 Kemba Walker 5 34.670886
## 21 Trevor Ariza 12 34.662500
## 22 Russell Westbrook 8 34.592593
## 23 Gordon Hayward 6 34.465753
## 24 Carmelo Anthony 13 34.297297
## 25 Marc Gasol 8 34.202703
## 26 Wesley Matthews 7 34.178082
## 27 Blake Griffin 6 34.032787
## 28 Nicolas Batum 8 33.987013
## 29 Klay Thompson 5 33.961538
## 30 Paul Millsap 10 33.956522
## 31 Jabari Parker 2 33.882353
## 32 Rudy Gobert 3 33.876543
## 33 Danilo Gallinari 7 33.873016
## 34 Isaiah Thomas 5 33.802632
## 35 Rudy Gay 10 33.766667
## 36 DeMarcus Cousins 6 33.764706
## 37 Goran Dragic 8 33.684932
## 38 Kawhi Leonard 5 33.432432
## 39 Stephen Curry 7 33.392405
## 40 Kevin Durant 9 33.387097
## 41 Avery Bradley 6 33.363636
## 42 Kentavious Caldwell-Pope 3 33.276316
## 43 Mike Conley 9 33.217391
## 44 Victor Oladipo 3 33.164179
## 45 Eric Bledsoe 6 32.969697
## 46 Ricky Rubio 5 32.920000
## 47 Evan Fournier 4 32.852941
## 48 Kristaps Porzingis 1 32.787879
## 49 Jrue Holiday 7 32.686567
## 50 Hassan Whiteside 4 32.636364
## 51 Otto Porter 3 32.562500
## 52 Derrick Rose 7 32.531250
## 53 Draymond Green 4 32.513158
## 54 Marcus Morris 5 32.468354
## 55 Jae Crowder 4 32.430556
## 56 LaMarcus Aldridge 10 32.430556
## 57 Jeff Teague 7 32.402439
## 58 Gorgui Dieng 3 32.353659
## 59 Al Horford 9 32.250000
## 60 Courtney Lee 8 31.935065
## 61 DeAndre Jordan 8 31.728395
## 62 Robert Covington 3 31.626866
## 63 George Hill 8 31.510204
## 64 Chris Paul 11 31.491803
## 65 Dennis Schroder 3 31.455696
## 66 Kevin Love 8 31.416667
## 67 Myles Turner 1 31.370370
## 68 Tobias Harris 5 31.304878
## 69 Gary Harris 2 31.263158
## 70 Markieff Morris 5 31.236842
## 71 Marcin Gortat 9 31.170732
## 72 Terrence Ross 4 31.166667
## 73 T.J. Warren 2 31.030303
## 74 Eric Gordon 8 30.973333
## 75 Serge Ibaka 7 30.956522
## 76 Wilson Chandler 8 30.943662
## 77 Patrick Beverley 4 30.716418
## 78 Khris Middleton 4 30.655172
## 79 Josh Richardson 1 30.452830
## 80 Marcus Smart 2 30.367089
## 81 Darren Collison 7 30.338235
## 82 Thaddeus Young 9 30.229730
## 83 Marvin Williams 11 30.197368
## 84 Dion Waiters 4 30.086957
## 85 Andre Roberson 3 30.075949
## 86 Tristan Thompson 5 29.948718
## 87 Dwyane Wade 13 29.866667
## 88 Steven Adams 3 29.862500
## 89 Tyler Johnson 2 29.835616
## 90 Andre Drummond 4 29.740741
## 91 Dwight Howard 12 29.716216
## 92 Solomon Hill 3 29.675000
## 93 Brook Lopez 8 29.626667
## 94 Elfrid Payton 2 29.414634
## 95 Ryan Anderson 8 29.388889
## 96 Jordan Clarkson 2 29.231707
## 97 Tony Snell 3 29.200000
## 98 Jusuf Nurkic 2 29.200000
## 99 Buddy Hield 0 29.080000
## 100 Al-Farouq Aminu 6 29.065574
## 101 Yogi Ferrell 0 29.055556
## 102 Michael Kidd-Gilchrist 4 29.000000
## 103 Seth Curry 3 28.985714
## 104 J.R. Smith 12 28.951220
## 105 Maurice Harkless 4 28.870130
## 106 Brandon Ingram 0 28.848101
## 107 Nikola Vucevic 5 28.840000
## 108 Julius Randle 2 28.810811
## 109 D'Angelo Russell 1 28.746032
## 110 Aaron Gordon 2 28.725000
## 111 Allen Crabbe 3 28.531646
## 112 Will Barton 4 28.416667
## 113 J.J. Redick 10 28.179487
## 114 Robin Lopez 8 28.037037
## 115 Nikola Jokic 1 27.917808
## 116 Cody Zeller 3 27.822581
## 117 Austin Rivers 4 27.756757
## 118 Tyson Chandler 15 27.617021
## 119 James Johnson 7 27.434211
## 120 Reggie Jackson 5 27.384615
## 121 Nik Stauskas 2 27.350000
## 122 JaMychal Green 2 27.285714
## 123 Jameer Nelson 12 27.266667
## 124 Tim Hardaway 3 27.265823
## 125 Monta Ellis 11 27.000000
## 126 Rodney Hood 2 27.000000
## 127 Tony Allen 12 26.957746
## 128 Kent Bazemore 4 26.890411
## 129 Rajon Rondo 10 26.710145
## 130 Garrett Temple 6 26.584615
## 131 Danny Green 7 26.573529
## 132 Luol Deng 12 26.535714
## 133 Malcolm Brogdon 0 26.426667
## 134 Dirk Nowitzki 18 26.370370
## 135 T.J. McConnell 1 26.333333
## 136 Jamal Crawford 16 26.304878
## 137 Andre Iguodala 12 26.289474
## 138 Dario Saric 0 26.283951
## 139 Alex Poythress 0 26.166667
## 140 DeMarre Carroll 7 26.138889
## 141 Matthew Dellavedova 3 26.131579
## 142 Frank Kaminsky 1 26.053333
## 143 Nick Young 9 25.933333
## 144 Jon Leuer 5 25.920000
## 145 Arron Afflalo 9 25.901639
## 146 Jonas Valanciunas 4 25.825000
## 147 Thabo Sefolosha 10 25.741935
## 148 Lou Williams 11 25.695652
## 149 Emmanuel Mudiay 1 25.563636
## 150 Evan Turner 6 25.507692
## 151 Iman Shumpert 5 25.486842
## 152 Pau Gasol 15 25.421875
## 153 P.J. Tucker 5 25.375000
## 154 Joel Embiid 0 25.354839
## 155 Rodney McGruder 0 25.205128
## 156 Tony Parker 15 25.190476
## 157 Ty Lawson 7 25.101449
## 158 Sean Kilpatrick 2 25.057143
## 159 Cory Joseph 5 25.037500
## 160 DeAndre Liggins 3 25.000000
## 161 Dante Cunningham 7 24.984848
## 162 E'Twaun Moore 5 24.931507
## 163 Trevor Booker 6 24.704225
## 164 Vince Carter 18 24.643836
## 165 Patrick Patterson 6 24.600000
## 166 Kyle Korver 13 24.542857
## 167 Jeremy Lin 6 24.527778
## 168 Zach Randolph 15 24.465753
## 169 Ersan Ilyasova 8 24.346154
## 170 Wayne Ellington 7 24.193548
## 171 Ish Smith 6 24.135802
## 172 Joe Ingles 2 24.048780
## 173 Marco Belinelli 9 24.027027
## 174 Edy Tavares 1 24.000000
## 175 Nikola Mirotic 2 23.985714
## 176 Clint Capela 2 23.861538
## 177 Derrick Favors 6 23.720000
## 178 Jerryd Bayless 8 23.666667
## 179 Joe Johnson 15 23.628205
## 180 Tim Frazier 2 23.461538
## 181 James Ennis 2 23.453125
## 182 Mason Plumlee 3 23.407407
## 183 C.J. Miles 11 23.368421
## 184 Jordan Crawford 4 23.263158
## 185 Troy Williams 0 23.166667
## 186 Gerald Henderson 7 23.152778
## 187 Bojan Bogdanovic 2 23.115385
## 188 Larry Nance Jr. 1 22.888889
## 189 Anthony Tolliver 8 22.723077
## 190 Jahlil Okafor 1 22.680000
## 191 Spencer Dinwiddie 2 22.610169
## 192 Rondae Hollis-Jefferson 1 22.576923
## 193 Isaiah Whitehead 0 22.506849
## 194 Greg Monroe 6 22.506173
## 195 Tyreke Evans 7 22.428571
## 196 Luc Mbah a Moute 8 22.337500
## 197 Sergio Rodriguez 4 22.323529
## 198 Jeff Green 8 22.231884
## 199 Bismack Biyombo 5 22.135802
## 200 Joakim Noah 9 22.065217
## 201 J.J. Barea 10 22.028571
## 202 Lance Stephenson 6 22.000000
## 203 Nerlens Noel 2 21.954545
## 204 Patty Mills 7 21.925000
## 205 Brandon Rush 8 21.914894
## 206 Shelvin Mack 5 21.909091
## 207 Joe Harris 2 21.884615
## 208 Caris LeVert 0 21.701754
## 209 Justin Anderson 1 21.583333
## 210 Jamal Murray 0 21.512195
## 211 Enes Kanter 5 21.291667
## 212 Jared Dudley 9 21.281250
## 213 Marquese Chriss 0 21.256098
## 214 Raymond Felton 11 21.250000
## 215 Kenneth Faried 5 21.245902
## 216 Taj Gibson 7 21.173913
## 217 Brandon Knight 5 21.111111
## 218 Lance Thomas 5 21.043478
## 219 Richaun Holmes 1 20.929825
## 220 Kelly Olynyk 3 20.506667
## 221 Jodie Meeks 7 20.500000
## 222 Matt Barnes 13 20.500000
## 223 Axel Toupane 1 20.500000
## 224 Andrew Harrison 0 20.472222
## 225 Timofey Mozgov 6 20.444444
## 226 Richard Jefferson 15 20.430380
## 227 Dorian Finney-Smith 0 20.271605
## 228 Alex Len 3 20.259740
## 229 Deron Williams 11 20.250000
## 230 Domantas Sabonis 0 20.148148
## 231 Amir Johnson 11 20.100000
## 232 Justin Holiday 3 19.987805
## 233 Kosta Koufos 8 19.985915
## 234 Chandler Parsons 5 19.852941
## 235 David Nwaba 0 19.850000
## 236 Langston Galloway 2 19.736842
## 237 D.J. Augustin 8 19.717949
## 238 Doug McDermott 2 19.545455
## 239 Shabazz Muhammad 3 19.435897
## 240 John Henson 4 19.362069
## 241 Ben McLemore 3 19.278689
## 242 Paul Zipser 0 19.159091
## 243 Jerami Grant 2 19.102564
## 244 Lucas Nogueira 2 19.087719
## 245 Willie Cauley-Stein 1 18.946667
## 246 Channing Frye 10 18.891892
## 247 Michael Carter-Williams 3 18.800000
## 248 Manu Ginobili 14 18.710145
## 249 David Lee 11 18.696203
## 250 Randy Foye 10 18.608696
## 251 Dante Exum 1 18.606061
## 252 Skal Labissiere 0 18.545455
## 253 Jason Terry 17 18.445946
## 254 Jeremy Lamb 4 18.435484
## 255 Sam Dekker 1 18.428571
## 256 Tyler Ulis 0 18.409836
## 257 Justin Hamilton 2 18.390625
## 258 Willy Hernangomez 0 18.388889
## 259 Montrezl Harrell 1 18.344828
## 260 Nemanja Bjelica 1 18.307692
## 261 Zaza Pachulia 13 18.114286
## 262 Norman Powell 1 18.000000
## 263 Ian Mahinmi 8 17.903226
## 264 Jonathon Simmons 1 17.846154
## 265 Tyler Ennis 2 17.818182
## 266 Stanley Johnson 1 17.805195
## 267 Shaun Livingston 11 17.697368
## 268 Mike Muscala 3 17.671429
## 269 Troy Daniels 3 17.656716
## 270 Boris Diaw 13 17.575342
## 271 Dewayne Dedmon 3 17.500000
## 272 Josh McRoberts 9 17.318182
## 273 Dwight Powell 2 17.311688
## 274 Timothe Luwawu-Cabarrot 0 17.246377
## 275 Jaylen Brown 0 17.192308
## 276 Wayne Selden 0 17.181818
## 277 Ed Davis 6 17.152174
## 278 Denzel Valentine 0 17.122807
## 279 Malcolm Delaney 0 17.095890
## 280 Noah Vonleh 2 17.094595
## 281 Kris Dunn 0 17.089744
## 282 Derrick Williams 5 17.080000
## 283 Omri Casspi 7 17.076923
## 284 Terry Rozier 1 17.067568
## 285 Derrick Jones 0 17.031250
## 286 Devin Harris 12 16.723077
## 287 Michael Beasley 8 16.696429
## 288 Delon Wright 1 16.518519
## 289 Meyers Leonard 4 16.513514
## 290 Ron Baker 0 16.480769
## 291 C.J. Watson 9 16.322581
## 292 Jerian Grant 1 16.317460
## 293 Trey Lyles 1 16.309859
## 294 Tarik Black 2 16.283582
## 295 Brandon Jennings 7 16.260870
## 296 Ramon Sessions 9 16.220000
## 297 Mirza Teletovic 4 16.185714
## 298 Georgios Papagiannis 0 16.136364
## 299 Ivica Zubac 0 16.026316
## 300 Brandan Wright 8 15.964286
## 301 Quincy Acy 4 15.937500
## 302 Mike Dunleavy 14 15.833333
## 303 Jonas Jerebko 6 15.794872
## 304 Cristiano Felicio 1 15.757576
## 305 Marreese Speights 8 15.682927
## 306 Luke Babbitt 6 15.661765
## 307 Bobby Portis 1 15.625000
## 308 Pascal Siakam 0 15.618182
## 309 Darrell Arthur 7 15.585366
## 310 Kyle O'Quinn 4 15.556962
## 311 Omer Asik 6 15.548387
## 312 Alec Burks 5 15.547619
## 313 Alex Abrines 0 15.514706
## 314 Aron Baynes 4 15.506667
## 315 Josh Huestis 1 15.500000
## 316 Archie Goodwin 3 15.333333
## 317 Semaj Christon 0 15.203125
## 318 Isaiah Canaan 3 15.179487
## 319 Patrick McCaw 0 15.126761
## 320 Reggie Bullock 3 15.064516
## 321 Alan Williams 1 15.063830
## 322 Alexis Ajinca 6 14.974359
## 323 Mindaugas Kuzminskas 0 14.941176
## 324 Corey Brewer 9 14.916667
## 325 Mario Hezonja 1 14.769231
## 326 Ian Clark 3 14.766234
## 327 K.J. McDaniels 2 14.650000
## 328 Jose Calderon 11 14.529412
## 329 Willie Reed 1 14.521127
## 330 Jason Smith 8 14.432432
## 331 Leandro Barbosa 13 14.373134
## 332 Beno Udrih 12 14.358974
## 333 Lavoy Allen 5 14.278689
## 334 Kyle Anderson 2 14.166667
## 335 Al Jefferson 12 14.106061
## 336 Donatas Motiejunas 4 14.088235
## 337 Aaron Brooks 8 13.753846
## 338 Juan Hernangomez 0 13.580645
## 339 Okaro White 0 13.457143
## 340 Miles Plumlee 4 13.384615
## 341 Dragan Bender 0 13.348837
## 342 Jarell Martin 1 13.285714
## 343 Shawn Long 0 13.000000
## 344 Isaiah Taylor 0 13.000000
## 345 Cameron Payne 1 12.909091
## 346 Tyus Jones 1 12.900000
## 347 Jarrod Uthoff 0 12.777778
## 348 Tomas Satoransky 0 12.614035
## 349 David West 13 12.558824
## 350 Chasson Randle 0 12.500000
## 351 Salah Mejri 1 12.397260
## 352 Trey Burke 3 12.333333
## 353 Quinn Cook 0 12.333333
## 354 Kris Humphries 12 12.303571
## 355 Wade Baldwin 0 12.272727
## 356 Briante Weber 1 12.230769
## 357 Davis Bertans 0 12.059701
## 358 Joffrey Lauvergne 2 12.050000
## 359 Kyle Singler 4 12.031250
## 360 Dahntay Jones 12 12.000000
## 361 Wesley Johnson 6 11.911765
## 362 Cheick Diallo 0 11.705882
## 363 Thomas Robinson 4 11.666667
## 364 Jakob Poeltl 0 11.592593
## 365 Elijah Millsap 2 11.500000
## 366 Gerald Green 9 11.446809
## 367 Kevin Seraphin 6 11.408163
## 368 Rashad Vaughn 1 11.170732
## 369 Andrew Nicholson 4 11.100000
## 370 Brandon Bass 11 11.096154
## 371 Paul Pierce 18 11.080000
## 372 Chinanu Onuaku 0 10.400000
## 373 Maurice Ndour 0 10.343750
## 374 Tyler Zeller 4 10.294118
## 375 Alan Anderson 7 10.266667
## 376 Brian Roberts 4 10.146341
## 377 Thon Maker 0 9.859649
## 378 Darrun Hilliard 1 9.769231
## 379 DeAndre' Bembry 0 9.763158
## 380 Sasha Vujacic 9 9.714286
## 381 Anthony Morrow 8 9.666667
## 382 Shabazz Napier 2 9.660377
## 383 Nicolas Brussino 0 9.648148
## 384 Norris Cole 5 9.615385
## 385 Marcus Georges-Hunt 0 9.600000
## 386 JaVale McGee 8 9.597403
## 387 Ronnie Price 11 9.571429
## 388 Sheldon McClellan 0 9.566667
## 389 Tiago Splitter 6 9.500000
## 390 Kay Felder 0 9.190476
## 391 Spencer Hawes 9 9.000000
## 392 Malachi Richardson 0 9.000000
## 393 James Michael McAdoo 2 8.788462
## 394 Raul Neto 1 8.650000
## 395 Patricio Garino 0 8.600000
## 396 Cole Aldrich 6 8.564516
## 397 Johnny O'Bryant 2 8.500000
## 398 Damian Jones 0 8.500000
## 399 Dejounte Murray 0 8.473684
## 400 Jeff Withey 3 8.470588
## 401 Kevon Looney 1 8.433962
## 402 Boban Marjanovic 1 8.371429
## 403 Christian Wood 1 8.230769
## 404 Pat Connaughton 1 8.102564
## 405 Marshall Plumlee 0 8.095238
## 406 Fred VanVleet 0 7.945946
## 407 James Jones 13 7.937500
## 408 Bryn Forbes 0 7.916667
## 409 Henry Ellenson 0 7.684211
## 410 Udonis Haslem 13 7.647059
## 411 James Young 2 7.586207
## 412 Rakeem Christmas 1 7.551724
## 413 Mike Miller 16 7.550000
## 414 Malik Beasley 0 7.500000
## 415 Adreian Payne 2 7.500000
## 416 A.J. Hammons 0 7.409091
## 417 Jake Layman 0 7.114286
## 418 Treveon Graham 0 7.000000
## 419 Damjan Rudez 2 6.977778
## 420 Ryan Kelly 3 6.875000
## 421 Jordan Hill 7 6.714286
## 422 Deyonta Davis 0 6.611111
## 423 Joel Anthony 9 6.421053
## 424 Nick Collison 12 6.400000
## 425 Metta World Peace 16 6.400000
## 426 Stephen Zimmerman 0 5.684211
## 427 Jordan Mickey 1 5.640000
## 428 Tim Quarterman 0 5.000000
## 429 Bobby Brown 2 4.920000
## 430 Bruno Caboclo 2 4.444444
## 431 Joel Bolomboy 0 4.416667
## 432 Joe Young 1 4.090909
## 433 Georges Niang 0 4.043478
## 434 Chris McCullough 1 4.000000
## 435 Daniel Ochefu 0 3.947368
## 436 Michael Gbinije 0 3.555556
## 437 Diamond Stone 0 3.428571
## 438 Demetrius Jackson 0 3.400000
## 439 Kyle Wiltjer 0 3.142857
## 440 Brice Johnson 0 3.000000
## 441 Roy Hibbert 8 1.833333
summarise()The next verb is summarise(). Conceptually, this involves applying a function on one or more columns, in order to summarize values. This is probably easier to understand with one example.
Say you are interested in calculating the average salary of all NBA players. To do this “a la dplyr” you use summarise(), or its synonym function summarize():
# average salary of NBA players
summarise(dat, avg_salary = mean(salary))
## avg_salary
## 1 6187014
Calculating an average like this seems a bit verbose, especially when you can directly use mean() like this:
mean(dat$salary)
## [1] 6187014
So let’s make things a bit more interessting. What if you want to calculate some summary statistics for salary: min, median, mean, and max?
# some stats for salary (dplyr)
summarise(
dat,
min = min(salary),
median = median(salary),
avg = mean(salary),
max = max(salary)
)
## min median avg max
## 1 5145 3500000 6187014 30963450
Well, this may still look like not much. You can do the same in base R (there are actually better ways to do this):
# some stats for salary (base R)
c(min = min(dat$salary),
median = median(dat$salary),
median = mean(dat$salary),
max = max(dat$salary))
## min median median max
## 5145 3500000 6187014 30963450
To actually appreciate the power of summarise(), we need to introduce the other major basic verb in "dplyr": group_by(). This is the function that allows you to perform data aggregations, or grouped operations.
Let’s see the combination of summarise() and group_by() to calculate the average salary by team:
# average salary, grouped by team
summarise(
group_by(dat, team),
avg_salary = mean(salary)
)
## # A tibble: 30 x 2
## team avg_salary
## <chr> <dbl>
## 1 ATL 6491892
## 2 BOS 6127673
## 3 BRK 4363414
## 4 CHI 6138459
## 5 CHO 6683086
## 6 CLE 8386014
## 7 DAL 6139880
## 8 DEN 5225533
## 9 DET 6871594
## 10 GSW 6579394
## # ... with 20 more rows
Here’s a similar example with the average salary by position:
# average salary, grouped by position
summarise(
group_by(dat, position),
avg_salary = mean(salary)
)
## # A tibble: 5 x 2
## position avg_salary
## <chr> <dbl>
## 1 C 6987682
## 2 PF 5890363
## 3 PG 6069029
## 4 SF 6513374
## 5 SG 5535260
Here’s a more fancy example: average weight and height, by position, displayed in desceding order by average height:
arrange(
summarise(
group_by(dat, position),
avg_height = mean(height),
avg_weight = mean(weight)),
desc(avg_height)
)
## # A tibble: 5 x 3
## position avg_height avg_weight
## <chr> <dbl> <dbl>
## 1 C 83.25843 250.7978
## 2 PF 81.50562 235.8539
## 3 SF 79.63855 220.4699
## 4 SG 77.02105 204.7684
## 5 PG 74.30588 188.5765
summarise() to get the largest height value.summarise(dat,max_height=max(height))
## max_height
## 1 87
summarise() to get the standard deviation of points3.summarise(dat, sd_p3=sd(points3))
## sd_p3
## 1 55.9721
summarise() and group_by() to display the median of three-points, by team.summarise(
group_by(dat,team),
median_3p= median(points3)
)
## # A tibble: 30 x 2
## team median_3p
## <chr> <dbl>
## 1 ATL 32.5
## 2 BOS 46.0
## 3 BRK 44.0
## 4 CHI 32.0
## 5 CHO 17.0
## 6 CLE 62.0
## 7 DAL 53.0
## 8 DEN 53.0
## 9 DET 28.0
## 10 GSW 18.0
## # ... with 20 more rows
summarise(
group_by(dat,team),
avg_3p= mean(points3)
)
## # A tibble: 30 x 2
## team avg_3p
## <chr> <dbl>
## 1 ATL 44.71429
## 2 BOS 65.66667
## 3 BRK 49.20000
## 4 CHI 37.66667
## 5 CHO 53.86667
## 6 CLE 67.46667
## 7 DAL 50.26667
## 8 DEN 57.86667
## 9 DET 42.06667
## 10 GSW 65.46667
## # ... with 20 more rows
age, for Power Forwards, with 5 and 10 years (including) years of experience.experience5_10 <- select(dat, dat$experience[between(dat$experience,5,10)])
summarise(
group_by(experience5_10, position="PF"),
m_age=mean(age),
sd_age=sd(age)
)
## # A tibble: 1 x 3
## position m_age sd_age
## <chr> <dbl> <dbl>
## 1 PF 26.29252 4.331509
ggplot()The package "ggplot2" is probably the most popular package in R to create beautiful static graphics. Comapred to the functions in the base package "graphcics", the package "ggplot2" follows a somewhat different philosophy, and it tries to be more consistent and modular as possible.
"ggplot2" is ggplot()ggplot() is a data frame object.aes() to specify what columns of the data frame will be used for the graphical elements of the plot.geom_point(), geom_bar(), geom_boxpot().+ operator.Let’s start with a scatterplot of salary and points
# scatterplot (option 1)
ggplot(data = dat) +
geom_point(aes(x = points, y = salary))
ggplot() creates an object of class "ggplot"ggplot() is data which must be a data frame"+" operator to add a layergeom_points()aes() is used to specify the x and y coordinates, by taking columns points and salary from the data frameThe same scatterplot can also be created with this alternative, and more common use of ggplot()
# scatterplot (option 2)
ggplot(data = dat, aes(x = points, y = salary)) +
geom_point()
Say you want to color code the points in terms of position
# colored scatterplot
ggplot(data = dat, aes(x = points, y = salary)) +
geom_point(aes(color = position))
Maybe you wan to modify the size of the dots in terms of points3:
# sized and colored scatterplot
ggplot(data = dat, aes(x = points, y = salary)) +
geom_point(aes(color = position, size = points3))
To add some transparency effect to the dots, you can use the alpha parameter.
# sized and colored scatterplot
ggplot(data = dat, aes(x = points, y = salary)) +
geom_point(aes(color = position, size = points3), alpha = 0.7)
Notice that alpha was specified outside aes(). This is because we are not using any column for the alpha transparency values.
gsw to make a scatterplot of height and weight.ggplot(gsw, aes(height, weight))+
geom_point()
height and weight,ggplot(gsw)+
geom_point(aes(height, weight))
geom_text(aes(player)) using
geom_text() to display the names of the players.
ggplot(gsw)+
geom_point(aes(height, weight))+
geom_text(aes(height, weight, label=player),nudge_x = 1, nudge_y = 1,check_overlap = T)
- Get a scatter plot of
height and weight, for ALL the warriors, displaying their names with geom_label().
ggplot(GSW)+
geom_point(aes(height, weight))+
geom_label(aes(height, weight, label=player), nudge_x = 0, nudge_y = 0,check_overlap = F)
## Warning: Ignoring unknown parameters: check_overlap
salary (for all NBA players).ggplot(dat,aes(salary))+
geom_density()
points2 with binwidth of 50 (for all NBA players).ggplot(dat,aes(points2))+
geom_histogram(binwidth=50)
position frequencies (for all NBA players).ggplot(dat,aes(position))+
geom_bar()
experience and salary of all Centers, and use geom_smooth() to add a regression line.Center <- filter(dat, position=="C")
ggplot(Center,aes(experience,salary))+
geom_smooth()
## `geom_smooth()` using method = 'loess'
experience and salary of all Centers, but now use geom_smooth() to add a loess line (i.e. smooth line).Center <- filter(dat, position=="C")
ggplot(Center,aes(experience,salary))+
geom_smooth(method = loess)
One of the most attractive features of "ggplot2" is the ability to display multiple facets. The idea of facets is to divide a plot into subplots based on the values of one or more categorical (or discrete) variables.
Here’s an example. What if you want to get scatterplots of points and salary separated (or grouped) by position? This is where faceting comes handy, and you can use facet_warp() for this purpose:
# scatterplot by position
ggplot(data = dat, aes(x = points, y = salary)) +
geom_point() +
facet_wrap(~ position)
The other faceting function is facet_grid(), which allows you to control the layout of the facets (by rows, by columns, etc)
# scatterplot by position
ggplot(data = dat, aes(x = points, y = salary)) +
geom_point(aes(color = position), alpha = 0.7) +
facet_grid(~ position) +
geom_smooth(method = loess)
# scatterplot by position
ggplot(data = dat, aes(x = points, y = salary)) +
geom_point(aes(color = position), alpha = 0.7) +
facet_grid(position ~ .) +
geom_smooth(method = loess)
experience and salary faceting by positionggplot(data = dat, aes(x = experience, y = salary)) +
geom_point() +
facet_wrap(~ position)
experience and salary faceting by teamggplot(data = dat, aes(x = experience, y = salary)) +
geom_point() +
facet_wrap(~ team)
- Make density plots of
age faceting by team
ggplot(data = dat, aes(age)) +
geom_density() +
facet_wrap(~ team)
- Make scatterplots of
height and weight faceting by position
ggplot(data = dat, aes(x = height, y = weight)) +
geom_point() +
facet_wrap(~ position)
- Make scatterplots of
height and weight, with a 2-dimensional density, geom_density2d(), faceting by position
ggplot(data = dat, aes(x = height, y = weight)) +
geom_density2d() +
facet_wrap(~ position)
- Make a scatterplot of
experience and salary for the Warriors, but this time add a layer with theme_bw() to get a simpler background
ggplot(data = dat, aes(x = experience, y = salary)) +
theme_bw()
- Repeat any of the previous plots but now adding a leyer with another theme e.g.
theme_minimal(), theme_dark(), theme_classic()
ggplot(data = dat, aes(x = experience, y = salary)) +
theme_minimal()
Now that you have a bunch of images inside the images/ subdirectory, let’s keep practicing some basic commands.
images/ directory of the lab.copies at the parent level (i.e. lab05/).copies folder.* to copy all the .png files in the directory copies.copies.mv to rename some of your PNG files.report/ directory.report/, find out how to rename the directory copies as copy-files.report/, delete one or two PNG files in copy-files.report/, find out how to delete the directory copy-files.